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PECEIVED 

CEI^RAi. FAX CENTER 

Amendments to the Claims „^ ^ 

SEP 1 8 2006 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 

Listing of aaims: 

1 1. (currently amended): A system for grouping clusters of 

2 semantically scored documents electronically stored in a data corpus, comprising: 

3 a scoring module determining a score , which is assigned to at least one 

4 concept that has been e xtracted jfrom a plurality of electronicallv-stored 

5 document s, wherein the score is based on at least one of a frequency of 

6 occurrence of the at least one concept within at least one such document, a 

7 concept weight, a structural weight, and a corpus weight; [[and]] 

8 a clustering module forming clusters of the documents by appl r yiog 

9 evaluating the score for the at least one concept [[to]] of ead^ document for a best 

10 fit critarion for e acb fluch document to the clusters and assigning each document 

11 to the cluster with the best fit: and 

12 a threshold module dynamically determfaiini^a threshold for each cluster 

13 based on similarities between the documents gou ped into the duster and a center 

14 of the cluster, and reassTwiint^ those documents having sifnilariti'ss outsid^ fljg 

15 threshold. 

1 2. (original): A system according to Claim 1, further comprising: 

2 the scoring module calculating the score as a function of a sunmiation of 

3 at least one of the frequency of occurrence, the concept weight, the structural 

4 weight, and the corpus weight of the at least one concept. 

1 3. (original): A system according to Claim 2, further comprising: 

2 a compression module compressing the score through logarithmic 

3 compression. 

1 4. (original): A system according to Claim 1, further comprising: 
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2 the scoring module calculating the concept weight as a function of a 

3 number of terms comptising the at least one concept. 

1 5* (original): A system according to Qaim 1, further comprising: 

2 the scoring tnodule calculating the stractural weight as a function of a 

3 location of the at least one concept within the at least one such document* 

1 6. (original): A system according to Qaim 1, further comprising: 

2 the scoring module calculating the corpus weight as a function of a 

3 reference count of the at least one concept over the plurality of documents. 

1 7. (original): A system according to Claim 1, further comprising: 

2 the scoring module forming the score assigned to the at least one concept 



3 to a normalized score vector for each such document, determining a similarity 

4 between the normalized score vector for each such document as an inner product 

5 of each normalized score vector, and applying the similarity to the best fit 

6 criterion* 

1 8. (original): A system according to Claim 1, further comprising: 

2 the clustering module evaluating a set of candidate seed documents 

3 selected from the plurality of documents, identifying a set of seed documents by 

4 applying the score for the at least one concept to a best fit criterion for each such 

5 candidate seed document, and basing the best fit criterion on the score of each 

6 such seed document. 



1 9. (currently amended): A method for grouping cliistcrs of 

2 . semanticallv scored documents electronically stored in a data corpus, comprising: 

3 determining a score , which is assigned to at least one concept that has 

4 bee.Ti e xtracted from a plurality of elcctronicallv-stored documents >_wberein_the 

5 score is based on at least one of a frequency of occurrence of the at least one 

6 concept within at least one such document, a concept weight, a structural weight, 

7 and a corpus weight; [[and]] 
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8 forming lo picallv-grouDed clustets of the documents by applying 

9 evaliiatjqjgLt he score for the at least one concept [[to]] of each document fo_r_a best 

10 fit critQrion for each such docnm e nL to the_clustef s and assigning ea c h document 

11 to the cluster w ith the best fit: 

12 dynamically dfttftfrnintntr a threshold for each cluster based on similarities 

13 between the d ocuments grouped into the cluster and a center of the cluster; and 

14 reassigning those documents having similarities outside the threshold. 

1 10. (original): A method according to Qaim 9, further comprising: 

2 calculating the score as a function of a summation of at least one of the 

3 frequency of occurrence, the concept weight, the structural weight, and the coxpus 

4 weight of the at least one concept 

1 11. (original): A method according to Qaim 10, further comprising: 

2 compressing the sc^re through logarithmic compxession. 

1 12. (original): A method according to Qaim 9, further comprising: 

2 calculating the concept weigfit as a function of a number of terms 

3 comprising the at least one concept. 

1 13. (original): A method according to Claim 9, further comprismg: 

2 calculating the structural weigiht as a function of a location of the at least 

3 one concept within the at least one such document. 

1 14. (original); A method according to Qaim 9, further comprising: 

2 calculating the coipus weight as a function of a reference count of the at 

3 least one concept over the plurality of documents. 

1 IS. (original): A method according to Qaim 9^ further comprising: 

2 forming the score assigned to the at least one concept to a normalized 

3 score vector for each such document; 

4 detennxning a similarity between the normalized score vector for each 

5 such document as an inner product of each normalized score vector; and 
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6 applying the similarity to-the best fit criterion, 

1 16. (ori@[naJ)r A method according to Claim 9, further comprising: 

2 evaluating a set of candidate seed documents selected fxom the plurality of 

3 documertts; 

4 identifying a set of seed documents by applying the score for the at least 

5 one concept to a best fit criterion for each such candidate seed document; and 

6 basing the best fit criterion on the score of each such seed document. 

1 17. (currently amended): A computer-readable storage medium 

2 holding code for porforming th e method of Claim Pi grouping dusters of 

3 semanticallv scored documents electronically^ stored in a date corpus, comprising; 

4 code for determining a score, which is_as5iened to at least one concept that 

5 has been extracted from a plurality of electronicallV'Stored documents, wherein 

- 6 Bte score is based on at least one, of a frequency of occurrence of the at least one 

7 concept withi n at least one snch document, a concent wei^t. a structural weight, 

8 gmd a qoyp n $ wei ght; 

9 code for forming logicallv-grouped clusters of the documents by 

10 evaluating the_scQte_for the at least one concept of each document for a best fit to 

11 the clusters and assigning each document to the cluster with the best fit: 

12 code for dynamically determining a threshold for each cluster based on 

13 siroilarities between the documents grou pedJntja^the^ustet^indjiJ^&nt^^^ 

14 cluster: and 

15 code for reassigning those doc t^ments h aving similarities outside the 

16 threshold. 

1 18« (currently amended): A system for providing efficient document 

2 scoring of concepts within [[a]] and clustering of documents in an electronicallv- 

3 stored document set, comprising: 

4 a scoring module scoring a document in art electronically-stored document 

5 set, comprisiqg:" 
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6 a frequency module determining a frequency of occurrence of at 

7 least one concept within a docmnent rotriavod from the docum e nt se t| - ao d 

8 do aiment ; 

9 a concept weight module analyzing a concept weight reflecting a 

10 specificity of meaning for the at least one concept within the document; 

11 a structural weight module analyzing a structural wei^t reflecting 

12 a degree of significance based on structural location within the document for the 

13 at least one concept; 

14 a corpus weight module analyzing a corpus weight inversely 

15 weighing a reference count of occurrences for the at least one concept within the 

16 document; and 

17 a scoring evaluation m odule evaluating a score to be a ssociated 

18 with the at least one concept as a fiinction of the frequency, concept wei^t^ 

19 stmctural weight, and corpus w e ight weight: and 

20 a clustering module ^uping the documents by score into a plurality of 

21 clusters, comprising; 

22 a duster seed module Jdentifying candidate seed documents, which 

23 are each assigned as a seed document intOA^duster with a center most sfanilaiLtQ 

24 the seed document and assipninp each non-seed document to the cluster with the 

25 best fit: and 

26 a threshold module dynamically determining a threshold for each 

27 cluster based on similarities between the documents in each duster and the cluster 

28 center^ and reassigning the documents with similariti&s outside the threshold. 

1 19. (currently amended): A system according to Claim 18, further 

2 comprising: 

3 the scoring module evaluating the score substantially in accordance with 

4 the fbimula: 
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6 where St comprises the score, ftj comprises the frequen*^, 0 < cw^/ < 1 comprises 

7 the concept weight, 0 < sw^ < 1 comprises the structural weig^it, and 0 < nv(,' ^ 1 

8 coniprises the corpus weight for occurrence j of concept i ► 

1 20. (currently amended): A system according to Qaim 19, further 

2 comprising: 

3 the concept weight module evaluating the concept weight substontiolly in 

4 accordance with the formula: 

b.25 + (0-25x t.\ 1 £ f » S 3 

5 dv- = - 0 .25 + (0. 25 X [? - J 4 £ i 6 

0.25, iff ^7 

6 where cwfj comprises the concept weight and t^j comprises a number of teims for 

7 occurrence j of each such concept L 

1 21. (currently amended): A system according to Claim 19, further 

2 comprising: 

3 the structural weight module evaluating the structural weight substantaaHy 

4 in accordance with the formula: 



1.0, 


if {j ^SUBJECT) 


0,8, 


if{j':> heading) 


- 0.7, 


if{j « summary) 


0^ 


ifij^BODY) 


0.1 


if{j ^ signature) 



6 where swtj comprises the structural weight for occurrence j of each such concept L 

1 22* (currently amended): A system according to Claim 19, further 

2 comprising: 

3 the corpus weight module evaluating the corpus weight oubatantiolly in 

4 accordance with the fonnula: 



-8- 

PA6E13/3rRCVDAT9/1812006 6:09:43 PM (Eastern D^^^^ 



09/18/2006 15:06 2053813999 



PATRICK JS INOUYE PS 



PAGE 



Response to Supplemental Office Action 
Docket No. 013.0207.US.UTL 



5 ny^=.\^ T' 

1.0, r^<M 

6 where rwij comprises the corpus weig^it, nj comprises a reference count for 

7 occurrence j of each such concept i, T comprises a total number of reference 

8 counts of documents in the document set, and M comprises a maximum reference 

9 count of documents in the document set. 

1 23. (currently amended): A system according to Claim 19, further 

2 comprising: 

3 a compression module compressing the soore gwibstontioll y in accordance 

4 with the formula: 

5 5;-log(5,+l) 

6 where 5J comprises the compressed score for each such concept L 

1 24. (original): A system according to Qaim 18, fdrther comprising: 

2 a global stop concept vector cache maintaining concepts and terms; and 

3 a filtering module filtering selection of the at least one concept based on 

4 the concepts and terms maintained in the global stop concept vector cadbe. 

1 25. (original): A system according to Qaim 18, further comprising: 

2 a parsing module identifying terms within at least one document in the 

3 document set, and combining the identified terms into one or more of the 

4 concepts. 

1 26. (original): A system according to Claim 25, further cx^mprising: 

2 the parsing module structuring each such identified term in the one or 

3 more concepts into canonical concepts comprising at least one of word root, 

4 character case, and word ordering. 

1 27» (original): A system according to Qaim 25, wherein at least one of 

2 nouns, proper nouns and adjectives are included as terms^ 
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1 28. (original): A system according to Claim 18, further comprising: 

2 a plurality of candidate seed documents; 

3 a similarity module determining a similarity between each pair of a 

4 candidate seed document and a cluster center; 

5 a clustering module designating each such candidate seed document 

6 separated from substantially all cluster centers with such similarity being 

7 sufficiently distinct as a seed document, and grouping each such candidate seed 

8 document not being sufficiently distinct into a cluster with a nearest cluster 

9 center. 

1 29. (origmal): A system according to Claim 28, further comprising: 

2 a plurality of non-seed docmnents; 

3 the similarity module determining the similarity between each non-seed 

4 document and each duster center; and 

5 the clusterit^ module grouping each such non-seed document into a 

6 cluster having a best fit, subject to a minimum fit criterion. 

1 30. (original): A system according to Claim 29, further comprising: 

2 a normalized score vector for each document comprising the score 

3 associated with the at least one concept for each such concept occurring within 

4 the document; and 

5 the similarity module determining the similarity as a function of the 

6 normalized score vector associated with the at least one concept for each such 

7 documcnt- 

1 31. (currcnay amended): A system according to Claim 30, further 

2 comprising: 

3 the similarity module calculating the similarity gubotontially in accordance 

4 with the formula: 
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6 where coscr^ comprises a simUarity between a document A and a document By 

7 comprises a score vector for document A, and Sg comprises a score vector for 

8 document B> 

1 Claims 32-34 (canceled). 

1 35. (currently amendecQ: A method for providing efficient document 

2 scoring of concepts within [[a]] and dnstering of documents in an electronicanv- 

3 stored document set, comprising: 

4 scoring a document in an elcctromcaUv^stored document set, comprising: 

5 determiningafrequency of occurrence of at least one concept 

6 within a docnment retri e ved firom tho document get; and document: 

7 analyzing a concept weight reflecting a specificity of meaning for 

8 the at least one concept within the document; 

9 analyzing a structural weight reflecting a degree of significanGe 

10 based on structural location within the document for the at least one concept; 

11 analyzing a corpus weight inversely weighing a reference count of 

12 occtnrrences for the at least one concept within the document; and 

13 evaluating a score to be a ssociated with the at least one concept as 

14 a function of the frequency^ concept weight* structural weight, and coipus w^^b 

15 weight: and 

16 gyotiping the documents by score into a plurality of clusters, comprising; 

17 identifying candidate seed documents, which are each assigned as 

18 a seed document into a duster with a center most similatLto the seed document: 

19 assigning each non-seed document to the cluster with the best fit: 

20 dynamically determining a threshold for each cluster based on 

21 aimiiaritia s between the documents in each cluster and the duster center: and 

22 reassigning the documents with similarities outside the threshold. 
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1 36. (currently amended): A method according to Qaim 35, further 

2 comprising: 

3 evaluating the score GubotontioH y in accordance with the formula: 

5 where St comprises the score, fi, comprises the frequency, 0 <: cwg ^ 1 comprises 

6 the concept weight, 0 < w,^ < 1 comprises the structural weighty and 0<rw^^l 

7 comprises the corpus weight for occurrence ; of concept L 

1 37. (currently amended): A method according to Claim 36, further 

2 comprising: 

3 evaluating the concept weight qubstontiolly in accordance with the 

4 formula: 

a2S + {o.25xr.y)t l<f^. <3 

5 ov^. =<0.25 + (0.25x[7-f3 4<f^. S6 

0.25, t..^7 

6 where cw|j comprises the concept weigiht and t^- comprises a number of terms for 

7 occurrence j of each such concept L 

1 38. (currenay amended): A method according to Qaim 36, fiuther 

2 comprismg: 

3 evaluating the structural weight subotantinH y in accordance with the 

4 formula: 



6 where swij comprises the structural weight for occurrence / of each such concept i. 
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39. (cunently amended): A method according to Claim 36, fiuther 
conqnising: 

evaluating the coipus weight sub s tontioti y in accordance with the formula: 



where rw(, comprises the corpus weight, rg comprises a reference count for 
occurrence ; of each such concept i, T comprises a total number of reference 
counts of documents in the document set, and M comprises a maximum reference 
count of documents in the document set. 

40. (cunently amended): A method according to Qaim 36, further 
comprising: 

compressing the score sub s tantiall y in accordance with the formula: 



where Si comprises, the compressed score for each such concept i. 

41 . (original): A method according to Qaim 35, further comprising: 
maintaining concepts and terms in a globd stop concept vector cache; and 
filtering selection of the at least one concept based on the concepts and 

terms maintained in the global stop concept vector cache. 

42. (original): A method according to Claim 35, further comprising: 
identifying terms within at least one document in the document set; and 
combining the identified terms into one or more of the concepts. 

43. (original): A method according to Qaim 42, further comprising: 
structuring each such identified term in the one or more concepts into 

canonical concepts comprising at least one of word root, character case, and word 
ordering. 
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1 44. (original): A ooethod according to Claim 42, fiirther comprising: 

2 including as terms at least one of nouns, proper nouns and adjectives. 

1 Claim 45 (canceled), 

1 46. (currently amended): A method according to Claim [[45 J] 3^ 

2 fiirther comprising: 

3 identifying a plurality of non-seed documents; 

4 determining the similarity between each non-seed document and each 

5 duster center; and 

6 grouping each such non-seed document into a cluster with a best fit, 

7 subject to a minimum fit criterion. 

1 47. (original): A method according to Claim 46, further comprising: 

2 forming a normalized score vector for each document comprising the 

3 score associated with the at least one concept for each sudi concept occuning 

4 within the document; and 

5 determining the similarity as a function of the noxmalized score vector 

6 associated with the at least one concept for each such document 

1 43. (currently amended): A method according to Claim 47, further 

2 comprising: 

3 calculating the similarity flubatontioUy in accordance with the formula: 

5 where coscr^ comprises a similarity between a document >i and a document 

6 5^ comprises a score vector for document A, and Sg comprises a score vector for 

7 document 

1 Oaims 49-51 (canceled). 



-14^ 

PA6E19/31'RCVDAT9/18/20066:09:43PM[EastemD^^^^ 



09/18/2006 



15:06 2063813999 



PATRICK JS INOUYE PS 



PAGE 20 



Response to Supplemental Office Action 
Docket No. 013.0207.US-UTL 



1 52. (currently amended): A computer-readable storage medium 

2 holding code for rrtrfnrming the Tneth<!>d of Claim 35. p roviding efScient 

3 document scoring of j^oncepts within [ [a ]]_and clustering of documents in an 

4 electronically-stored document set, comprising: 

5 code for scoring a document in an clectromcallv-stored docum ent set^ 

6 comprising: 

7 code for deter mining a frequency of occurrence of at least one 
8. concept within a document: 

9 code for analyzing a concept weight reflectinj^ a spedficitv of 

10 meaning for the at least one concept within the document: 

11 code for analyzing a stmctural weig ht reflecting a degree of 

12 KtpiiificaTi ce based on structural location within the document for the at least one 

13 concept: 

14 code for analyzing a corpus weight inversely weighing a reference 

15 count of occunences for the at least one concept within the document: and 

16 code for evaluating a^score to be associated with the at least one 

17 concept a& a function of the frequency, concept wei^. structural weight^asd 

18 corpus weight: and 

19 code fbtiacQuping the documents bv score into a plurality of clustftts> 

20 comprising: 

21 code for identifying candidate seed docmaents. which are each 

22 assigned asLajseecLdocument into a cluster with^ most similar to the seed 

23 documenu 

24 cxHie_fojr_asstgatng each noor-seed document to the cluster with the 

25 best fit: 

26 code for dynamically determining a threshold for each cluster 

27 based on similarities between the documents in each cluster and the cluster centen 

28 and 

29 code for reassigning the documents with similarities outside flie 

30 threshold, 
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1 53 . (currently amended): An apparatus for providing efficient 

2 document scoring of concepts within [[a]] and clustering of d ocuments in an 

3 electronically-stored document set, comprising: 

4 means for scoring a Ho/^iment: jjp an electronicallv-stored doc ument set, 

5 <?p^pyising: 

6 means for detennining a frequency of occurrence of at least one 

7 concept within a d eeumcnt retrieved from the document set; and document; 

8 means for analyzing a concept weight reflecting a specificity of 

9 meaning for the at least one oonoept within the document; 

10 means for analyzing a structural weight reflecting a degree of 

11 significance based on structural location within the document for the at least one 

12 concept; 

13 means for analyzing a coipus weight inversely weighing a 

14 reference count of occiirrehGes for the at least one concept within the document; 

15 and 

16 means for evaluating a score to be associated with the at least one 

17 concept as a function of the frequency, concept weight, structural weight, and 
IS corpus weightr weigbtLand 

19 means for gr ouping the documents bv score into a plurality of clusters. 

20 comprising: 

21 means foiiidentifvingL<andidate_seed^^^ which are each 

22 assigned as a seed document into a duster with a center most similar to the seed 

23 document: 

24 meatisJfor_assigning_each non-seed document to the cluster with 

25 the best fit: 

26 means for dynamically detennining a threshold for each cluster 

27 based on similarities between the documents in each cluster and the cluster center: 

28 mi 

- 29 - means Jbr reassigning the documents with similarities outside the 

30 threshold . 
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